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= Autonomous Vehicles almost “solved” 
e But... “almost” is misleading 


m= Huge challenge: safety 
e AVs present additional challenges 
e Perception edge cases are a limiting factor a~ 
e Testing alone wont get us to safety 


[General Motors] 


= Safety requires a standards + safety case approach 


e Life cycle argument supporting deployment safety 
e ANSI/UL 4600 standard for #DidYouThinkofThat ? 
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= Safety is a system property 
e Correctness is not enough for safety 


= Safety engineering emphasis on hazard mitigation 
e Identify hazards: if X goes wrong, could result in loss event 

— Includes hardware failures, tool defects, environmental surprises 

e Predict risk = probability * consequence 


— The tricky part is: “Probably Never * Catastrophic” 
e Mitigate risk via: 





— Engineering rigor: process quality, analysis, test, redundancy patterns 
— Functional safety: detect and shut down malfunctioning equipment 


— Safety of Intended Function (SOTIF): resilience to requirements gaps, 
inconsistent sensor data, unexpected environments 
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Why Is AV Safety Complicated? Niion 
= Public expectations a 
e Expect super-human machine performance 
e Trust too easily given, backlash when broken 
= Technical challenges 
e Machine Learning safety is work in progress 





= Historical industry culture clash 
e Autonomy researchers: it's all about the cool small-scale demo 
e Silicon Valley: move fast + break things 
e Automotive: blame driver for not mitigating equipment failures 
e Regulators: test-centric; weak digital safety expertise 
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Heaviest technical lift is perception/prediction safety 
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Perception Builds the World Model See 
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COMPUTER’S 

WORLD MODEL: Path Planning 
“Child chasing & 
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PERCEPTION 


Perception & prediction 
present a uniquely difficult 
assurance challenge 
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Edge Cases As A Limiting Factor | Nion 





= Machine learning is best at 
what it has already seen 
e But the world is full of novelty 


e Perception/prediction poor at 
recognizing it is just guessing 


bird 8.997 
feather 8.978 
nature 8.963 
= Is this a Person or Chicken? poultry 0.954 


outdoors 8.936 


color 8.910 





= Edge Case are surprises 
e You won't see these in testing — 
=» Edge cases are the stuff you didnt think of! 
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Have you covered the possible unknowns? 
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Good for identifying “easy” cases 
e Expensive and potentially dangerous 
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Autonomy Testing Risks ieeke Fe 
m Uber ATG fatality, Tempe AZ/US: March 2018 
e Uber ATG closed: January 2021 
= Local Motors injury, Whitby CA: Dec. 2021 
e Company closed: Jan. 2022 
= Pony.Al crash: CA/US: Oct. 2021 
e Uncrewed test permit revoked 
= WeRide sleeping test driver: Oct. 2021 
e Company deflects issue / no apparent regulator action 
= Easymile shuttle phantom braking injuries: (2019, 2020) 
= SAE J3018 standard for testing safety (2015; 2020 update) 
e Only Argo.Al publicly pledges conformance © 2022 Philip Koopman 12 
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Brute Force Road Testing jee 
= If 100M miles/critical mishap... =< 


e Test 3x-—10x longer than mishap rate 
=> Need 1 Billion miles of testing 





4.03 million mi 


4.97 mi 


= That's ~25 round trips 
on every road in the world - 
e With fewer than 10 critical mishaps ~ 


> 








e Start over for each software update 


=» Brute force testing impracticable 3080 Btn nln 8g 
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Safer, but expensive 
e Not scalable 
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Highly scalable; less expensive than road testing 


e Simulation validation (“tool qualification”) 
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How Much Do You Trust Simulation? a, 


ct Would you put your child in front of this self driving car: 


e 10,000M simulation miles 
.. perhaps with a simulator error? 


e 100M miles data collected 
.. perhaps missing some relevant scenarios? 


e 10M of road testing 
.. that missed high risk situations? 


e Designed with research-quality tooling ~ 
... With no safety qualification? (@ 


e With 5% labeling errors in training data? 





= Need simulation and other tool qualification 
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Industry Safety Standards Can Help pe 
= ISO 26262 — Functional Safety 
e Covers run-time faults & design defects 
e Assumes complete requirements known fF : i C 
# ISO 21448 — SOTIF 
e SOTIF: “Safety Of The Intended Function” : 
e Iteratively mitigate discovered “unknowns’ § = 
= Also need: #DidYouThinkofThat? lists ERM: 
e A technically substantive safety argument — 
e Evidence of coverage initially + feedback from surprises 
e Continuously improve based on lessons learned 
e A way to organize everything to ensure safety 
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Safety Cases To Organize Safety Argument iin 


= Claim — a property of the system 


e “System avoids pedestrians” CLAIM 


= Argument — why this is true 


e “Detect & maneuver to avoid” 
© fa) ee @ 


m Evidence — supports argument 
e Tests, analysis, simulations, ... 


= Sub-claims/arguments address 
complexity 


e “Detects pedestrians” // evidence 
e “Maneuvers around detected pedestrians’ // evidence 
e “Stops if can't maneuver’ // evidence 
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Lifecycle, Maintenance & Supply Chain — i", 
= Safety related maintenance 
e What maintenance is required for safety? 
e How do you know it is done effectively? 
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= Safety related aspects of lifecycle 


e Requirements/design/ML training oe ee a ee, 
e Handoff to manufacturing; deployment  — ~ © © | 
e Supply chain ) til il 


e Field modifications & updates 
e Operation, retirement & disposal 


= Safety case kept updated during system lifecycle 
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UL 4600 — An Autonomy Safety Standard © yee 


= Evaluation of a Safety Case 


e Independently assess safety case ANSI/UL 4600 2™ Edition 

e Mix & match supporting standards Evaluation of Autonomous Products 
e Discourages questionable practices Cy 0 Soin 

e Extensive #DidYouThinkofThat? lists =" =" 


Edition Date: March 15, 2022 
ANSI Approved: March 15, 2022 


= “Unknowns are first class citizens 
e Balance between analysis & field experience 
e Field monitoring used for continual safety case improvement 
e Assessment findings & field data used to update practices 

= ANSI/UL 4600 2"¢ Edition issued March 2022 
e 3'¢ edition to address heavy trucks in progress 
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The Path To Achieving AV Safety Mell 


= Cultural reconciliation within industry 
e Safety for on-road testing (driver & vehicle) 
e Mature beyond a rushed demo mentality 
= Stakeholder trust for acceptable safety 
e System-level safety for machine learning 
e Independent safety assessments 
= Use industry safety standards 
e Reform “standards optional” regulations 
e Traditional software safety ... PLUS ... 
— Account for unknown unknowns at deployment = =~ 
e UL 4600 Autonomous Vehicle Safety Standard Ber a 
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BoF Discussion Starters ae 
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Autonomous Vehicles and Software Safety Engineering 
= Should software developers share blame for a fatality? 

e Ethics of when to deploy “beta” software on public roads | 
m= Machine learning — how do we: 

e Ensure training data coverage of operational domain 

e Account for high risk heavy tail events (see SEAMS talk) 
= Commercial/research software for life critical systems 





Trolley Problem 


e Simulator software & simulation object models is irrelevant 

: ‘ . for practical AVs 
e Machine Learning development toolchains itus:/) yours Wal 
e DevOps, cloud infrastructure, and SaaS toolchains 30YIMc1k2Xw 


=m Gaps between ICSE research results and AV system level safety 
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